Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Information Classification Extension #785

Closed

Conversation

matthunt1984
Copy link

This is the first draft of a data classification extension to cloud events.

I've created this PR as draft to allow debate from JS colleagues or others in the community.

@@ -0,0 +1,35 @@
# Information Classification

As part of ISO27001 control objective A8.2 aims to address 'Information Classification' whereby information and data in an organisation is properly managed, including classifcation in relation to sensitivity of the data, legislation etc. A.8.2.2 requires that electronic assets should be 'labelled', and this extension allows the `data` of cloudevents to be appropriately labelled with the `classification` of the event being shared.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make ISO027001 a link to the spec?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this one of those specs you need to buy, which is why I didn't link directly... on our internal intranet I linked to some non-normative articles where people have discussed these reference

I can maybe:
Link to wiki https://en.wikipedia.org/wiki/ISO/IEC_27001
Copy the link that wiki reference to the spec https://www.bsigroup.com/en-GB/about-bsi/media-centre/press-releases/2005/11/ISOIEC-27001-International-Information-Security-Standard-published/

I think Wikipedia might be better, if we were to link to anything?

@JemDay
Copy link
Contributor

JemDay commented Mar 23, 2021

Hey @matthunt1984 ..

  • Looks like you need to sign-off your recent commit.
  • I'm curious as to how you see this being used, I understand the need for such data classification internally but i'm not clear on how this would be used on a per-event-basis. Is this type of meta-data better represented as a part of the event data schema and/or exposed via discovery APIs.

@JemDay
Copy link
Contributor

JemDay commented Mar 23, 2021

I also still struggle with en-US; I'm waiting for somebody to comment on your commit comment s/Americanised/Americanized ;-)

in uppercase.
- Constraints:
- REQUIRED
- MUST be a non-empty string (TBD)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the TBD for?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I wanted a second opinion on this. My starting point would be to not allow an empty string, unless it had a specific purpose. You could for example theoretically have a component that implements the extension, but didn't know how to classify the data after inspection. In such a state you could use "" or maybe an explicit "Unknown".
I think the better implementation is that the field be omitted which maintains the data is 'unclassified' and therefore I'd suggest "MUST be a non-empty string"


- Type: `String`
- Description: The `classification` of the `data`. The value SHOULD be expressed
in uppercase.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why put a constraint on the uppercase part, I think the classification labels should be open?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure here, I figured it was better to assume case-insensitivity as "public" and "PUBLIC" for example should be treated equally to avoid potential confusions. It's also common practice in a few organizations I've been in to use uppercase on documents as it makes it clear the word is used as a 'keyword, in relation to policy'. Perhaps we could define the values should be treated case-insensitive, but allow the value set either way? I was leaning to force upper as it makes it explicit, and the 'SHOULD' allows flexibility if for some reason it were problematic.


### classification

- Type: `String`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to allow multiple values

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I'm not sure the underlying standard supports multiple labels. It's normally a hierarchy and you would therefore label with the most sensitive label. This might be worth validating with someone who knows the underlying spec better.

@tweing
Copy link
Contributor

tweing commented Mar 28, 2021

We had the exact same discussions in our company. Classification is a very broad concept and it is difficult to apply to the whole payload of the event. We also had discussions on how a consumer would use this attribute and we did not came to a common conclusion. It would be more beneficial to know, which attributes of the payload are privacy relevant. On the other hand it might be handled differently around the Globe. It is a difficult topic though.

Long story short, we ended up providing a generic boolean value for privacyrelevant to indicate the consumer that they should be careful in processing in terms of privacy or not.

Additionally we added a generic labels array with key/value pairs for someone who sees a need for classification or any other type of information seen useful.

We are still in a learning phase whether these attributes make sense or not. Therefore we did not propose them as extensions to CloudEvents yet.

@matthunt1984
Copy link
Author

  • I'm curious as to how you see this being used, I understand the need for such data classification internally but i'm not clear on how this would be used on a per-event-basis. Is this type of meta-data better represented as a part of the event data schema and/or exposed via discovery APIs.
    @JemDay

We had the exact same discussions in our company. Classification is a very broad concept and it is difficult to apply to the whole payload of the event. We also had discussions on how a consumer would use this attribute and we did not came to a common conclusion. It would be more beneficial to know, which attributes of the payload are privacy relevant. On the other hand it might be handled differently around the Globe. It is a difficult topic though.

Long story short, we ended up providing a generic boolean value for privacyrelevant to indicate the consumer that they should be careful in processing in terms of privacy or not.

Additionally we added a generic labels array with key/value pairs for someone who sees a need for classification or any other type of information seen useful.

We are still in a learning phase whether these attributes make sense or not. Therefore we did not propose them as extensions to CloudEvents yet.
@tweing

I think these are some key debate points as to whether this extension gets put into practice. Here is where I'm personally a fan of field-level classification on a schema, or where that might not be possible to try and do so at interface-level (i.e. every message in may cases, would have the same classification). However my wider org is also interested in the event-level approach, so I said that I'd show how this would look if we did it as an extension, and it was the first extension use case that had support.

For reference, I did a POC on Avro using field-level schema with certain fields tagged as Personally Identifiable Information (PII). This is for me the ideal approach for our use case : https://github.com/matthunt1984/avro-generic-anonymiser

@duglin
Copy link
Collaborator

duglin commented Apr 29, 2021

@matthunt1984 any update on this?

@duglin
Copy link
Collaborator

duglin commented May 20, 2021

tap tap tap... @matthunt1984 you still interested in this?

@matthunt1984
Copy link
Author

I've not had much support in my org and I'm still personally in favour of alternatives, so suggest we can close?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants